GPURFSCREEN: a GPU based virtual screening tool using random forest classifier

نویسندگان

  • P. B. Jayaraj
  • Mathias K. Ajay
  • M. Nufail
  • G. Gopakumar
  • U. C. Abdul Jaleel
چکیده

BACKGROUND In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). RESULTS Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. CONCLUSION Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)

Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...

متن کامل

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

The influence of negative training set size on machine learning-based virtual screening

BACKGROUND The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. RESULTS The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. A...

متن کامل

Bearing Capacity of Shallow Foundations on Cohesionless Soils: A Random Forest Based Approach

Determining the ultimate bearing capacity (UBC) is vital for design of shallow foundations. Recently, soft computing methods (i.e. artificial neural networks and support vector machines) have been used for this purpose. In this paper, Random Forest (RF) is utilized as a tree-based ensemble classifier for predicting the UBC of shallow foundations on cohesionless soils. The inputs of model are wi...

متن کامل

Machine Learning based Predictive Model for Screening Mycobacterium Tuberculosis Transcriptional Regulatory Protein Inhibitors from High-Throughput Screening Dataset

In view of the essential role played by dosRS in the survival of Mycobacterium in the infected granuloma cells, dosRS transcriptional regulatory proteins were considered as a validated target for high throughput screening (HTS). However, the cost and time factor involved in screening large compound libraries are an important hurdle in identifying lead compounds. Therefore, the use of computatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of cheminformatics

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2016